首页> 外文OA文献 >Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing
【2h】

Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing

机译:结合SVM-CRF用于最大双向压缩的生物命名实体识别

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Biological named entity recognition, the identification of biological terms in text, is essential for biomedical information extraction. Machine learning-based approaches have been widely applied in this area. However, the recognition performance of current approaches could still be improved. Our novel approach is to combine support vector machines (SVMs) and conditional random fields (CRFs), which can complement and facilitate each other. During the hybrid process, we use SVM to separate biological terms from non-biological terms, before we use CRFs to determine the types of biological terms, which makes full use of the power of SVM as a binary-class classifier and the data-labeling capacity of CRFs. We then merge the results of SVM and CRFs. To remove any inconsistencies that might result from the merging, we develop a useful algorithm and apply two rules. To ensure biological terms with a maximum length are identified, we propose a maximal bidirectional squeezing approach that finds the longest term. We also add a positive gain to rare events to reinforce their probability and avoid bias. Our approach will also gradually extend the context so more contextual information can be included. We examined the performance of four approaches with GENIA corpus and JNLPBA04 data. The combination of SVM and CRFs improved performance. The macro-precision, macro-recall, and macro-F1 of the SVM-CRFs hybrid approach surpassed conventional SVM and CRFs. After applying the new algorithms, the macro-F1 reached 91.67% with the GENIA corpus and 84.04% with the JNLPBA04 data.
机译:生物命名实体识别,即文本中生物术语的识别,对于生物医学信息提取至关重要。基于机器学习的方法已广泛应用于这一领域。但是,当前方法的识别性能仍然可以提高。我们的新颖方法是将支持向量机(SVM)和条件随机场(CRF)结合起来,两者可以相互补充和促进。在混合过程中,我们先使用SVM将生物学术语与非生物学术语分开,然后再使用CRF确定生物学术语的类型,这充分利用了SVM作为二进制分类器和数据标记的功能。 CRF的容量。然后,我们将SVM和CRF的结果合并。为了消除合并可能导致的任何不一致,我们开发了一种有用的算法并应用了两个规则。为了确保识别具有最大长度的生物学术语,我们提出了一种寻找最长术语的最大双向压缩方法。我们还为罕见事件增加了正面收益,以增强其发生概率并避免偏见。我们的方法还将逐步扩展上下文,以便可以包含更多上下文信息。我们用GENIA语料库和JNLPBA04数据检查了四种方法的性能。 SVM和CRF的组合提高了性能。 SVM-CRF混合方法的宏精度,宏调用和宏F1超越了常规SVM和CRF。应用新算法后,GENIA语料库的macro-F1达到了91.67%,JNLPBA04数据的达到了84.04%。

著录项

  • 作者

    Zhu, Fei; Shen, Bairong;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号